Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking
نویسنده
چکیده
Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work proposes an online method for speaker detection, speaker direction tracking, and speech separation. The separation is based on multiple acoustic source tracking (MAST) using Bayesian filtering and time-frequency masking. Measurements from three room environments with varying amounts of reverberation using two different designs of microphone arrays are used to evaluate the capability of the method to separate up to four simultaneously active speakers. Separation of moving talkers is also considered. Results are compared to two reference methods: ideal binary masking (IBM) and oracle tracking (O-T). Simulations are used to evaluate the effect of number of microphones and their spacing.
منابع مشابه
Modulation domain blind source separation for noisy speech mixture
In this paper, we propose a noise-robust blind speech separation (BSS) method by using two microphones. We first use modulation domain real and imaginary spectral subtraction (MRISS) to enhance both magnitude and phase spectra of the speech mixture inputs. We then estimate the direction of arrivals (DOAs) of the speech sources and perform time-acoustic-modulation frequency masking to recover th...
متن کاملEvaluations on underdetermined blind source separation in adverse environments using time-frequency masking
The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-...
متن کاملContinuous time-frequency masking method for blind speech separation with adaptive choice of threshold parameter using ICA
We propose a novel method for blind speech separation using continuous time-frequency masking. The method is equipped with an adaptive choice of a threshold parameter that is based on utilization of ICA methods. We present a direct application that consists in the speech segregation for automatic transcription of spoken broadcasts disturbed by background music. Experimental results show improve...
متن کاملTowards single-channel unsupervised source separation of speech mixtures: the layered harmonics/formants separation-tracking model
Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of states to capture source spectral variation, and trained on large amounts of isolated speech. Since observations can be similar between sources, inference relies on sequential constraints from the state transition matrix which are, however, quite weak. To avoid these problems, we propose a strat...
متن کاملBlind Signal Separation and Speech Recognition in the Frequency Domain
In this paper it is shown that a Blind Signal Separation (BSS) method in the frequency domain (FDBSS) improves significantly the speaker Signal to Interference Ratio (SIR) and the phoneme recognition score of a continuous speech, speaker-independent acoustic decoder in a multi-simultaneous-speaker office environment. Specifically, the efficiency of the presented FDBSS method is studied on a TIT...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 27 شماره
صفحات -
تاریخ انتشار 2013